Conserved sequence

In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences (such as RNA and DNA sequences), protein sequences, protein structures or polymeric carbohydrates across species (orthologous sequences) or within different molecules produced by the same organism (paralogous sequences). In the case of cross species conservation, this indicates that a particular sequence may have been maintained by evolution despite speciation. The further back up the phylogenetic tree a particular conserved sequence may occur the more highly conserved it is said to be. Since sequence information is normally transmitted from parents to progeny by genes, a conserved sequence implies that there is a conserved gene.

It is widely believed that mutation in a "highly conserved" region leads to a non-viable life form, or a form that is eliminated through natural selection.

1 Conserved nucleic acid sequences
2 Conserved protein sequences and structures
3 Conserved polymeric carbohydrate sequences
4 Biological role of sequence conservation
5 References
6 See also

Conserved nucleic acid sequences

Highly conserved DNA sequences are thought to have functional value. The role for many of these highly conserved non-coding DNA sequences is not understood. One recent study that eliminated four highly-conserved non-coding DNA sequences in mice yielded viable mice with no significant phenotypic differences; the authors described their findings as "unexpected".^[1].

Many regions of the DNA, including highly conserved DNA sequences, consist of repeated sequence (DNA) elements. One possible explanation of the null hypothesis above is that removal of only one or a subset of a repeated sequence could theoretically preserve phenotypic functioning on the assumption that one such sequence is sufficient and the repetitions are superfluous to essential life processes; it was not specified in the paper whether the eliminated sequences were repeated sequences.

The TATA promoter sequence is an example of a highly conserved DNA sequence, being found in most eukaryotes.

Conserved protein sequences and structures

Highly conserved proteins are often required for basic cellular function, stability or reproduction. Conservation of protein sequences is indicated by the presence of identical amino acid residues at analogous parts of proteins. Conservation of protein structures is indicated by the presence of functionally equivalent, though not necessarily identical, amino acid residues and structures between analogous parts of proteins.

Shown below is an amino acid sequence alignment between two human zinc finger proteins, with GenBank accession numbers AAB24882 and AAB24881. Alignment was carried out using the clustalw sequence alignment program. Conserved amino acid sequences are marked by strings of $\mathrm{*}$ on the third line of the sequence alignment. As can be seen from this alignment, these two proteins contain a number of conserved amino acid sequences (represented by identical letters aligned between the two sequences).

Conserved polymeric carbohydrate sequences

The monosaccharide sequence of the glycosaminoglycan heparin is conserved across a wide range of species.

Biological role of sequence conservation

Sequence similarities serve as evidence for structural and functional conservation, as well as of evolutionary relationships between the sequences. Consequently, comparative analysis is the primary means by which functional elements are identified.

Among the most highly conserved sequences are the active sites of enzymes and the binding sites of a protein receptors.

Conserved non-coding sequences often harbor cis-regulatory elements which constrain evolution. Some deletions of highly conserved sequences in humans (hCONDELs) and other organisms have been suggested to be a potential cause of the anatomical and behavioral differences between humans and other mammals.^[2]

References

^ Ahituv N, Zhu Y, Visel A, et al. (2007). "Deletion of ultraconserved elements yields viable mice". PLoS Biol. 5 (9): e234. doi:10.1371/journal.pbio.0050234. PMC 1964772. PMID 17803355. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=1964772.
^ McLean, C. Y.; Reno, P. L.; Pollen, A. A.; Bassan, A. I.; Capellini, T. D.; Guenther, C.; Indjeian, V. B.; Lim, X. et al. (2011). "Human-specific loss of regulatory DNA and the evolution of human-specific traits". Nature 471 (7337): 216–219. doi:10.1038/nature09774. PMC 3071156. PMID 21390129. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3071156. edit

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997). The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 25:4876-4882.
http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pbio.0050253